nlp_architect.pipelines.spacy_bist.SpacyBISTParser

class nlp_architect.pipelines.spacy_bist.SpacyBISTParser(verbose=False, spacy_model='en', bist_model=None)[source]

Main class which handles parsing with Spacy-BIST parser.

Parameters
  • verbose (bool, optional) – Controls output verbosity.

  • spacy_model (str, optional) – Spacy model to use

  • https ((see) – //spacy.io/api/top-level#spacy.load).

  • bist_model (str, optional) – Path to a .model file to load. Defaults pre-trained model’.

__init__(verbose=False, spacy_model='en', bist_model=None)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__([verbose, spacy_model, bist_model])

Initialize self.

parse(doc_text[, show_tok, show_doc])

Parse a raw text document.

to_conll(doc_text)

Converts a document to CoNLL format with spacy POS tags.

Attributes

dir

dir = PosixPath('/Users/pizsak/nlp-architect/cache/bist-pretrained')
parse(doc_text, show_tok=True, show_doc=True)[source]

Parse a raw text document.

Parameters
  • doc_text (str) –

  • show_tok (bool, optional) – Specifies whether to include token text in output.

  • show_doc (bool, optional) – Specifies whether to include document text in output.

Returns

The annotated document.

Return type

CoreNLPDoc

to_conll(doc_text)[source]

Converts a document to CoNLL format with spacy POS tags.

Parameters

doc_text (str) – raw document text.

Yields

list of ConllEntry – The next sentence in the document in CoNLL format.